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A sparse grid method for Bayesian uncertainty 
quantification with application to large eddy 
simulation turbulence models* 
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Abstract There is wide agreement that the accuracy of turbulence models suffer from 
their sensitivity with respect to physical input data, the uncertainties of user-elected 
parameters, as well as the model inadequacy. However, the application of Bayesian 
inference to systematically quantify the uncertainties in parameters, by means of 
exploring posterior probability density functions (PPDFs), has been hindered by 
the prohibitively daunting computational cost associated with the large number of 
model executions, in addition to daunting computation time per one turbulence 
simulation. In this effort, we perform in this paper an adaptive hierarchical sparse 
grid surrogate modeling approach to Bayesian inference of large eddy simulation 
(LES). First, an adaptive hierarchical sparse grid surrogate for the output of forward 
models is constructed using a relatively small number of model executions. Using 
such surrogate, the likelihood function can be rapidly evaluated at any point in 
the parameter space without simulating the computationally expensive LES model. 
This method is essentially similar to those developed in [62] for geophysical and 
groundwater models, but is adjusted and applied here for a much more challenging 
problem of uncertainty quantification of turbulence models. Through a numerical 
demonstration of the Smagorinsky model of two-dimensional flow around a cylinder 
at sub-critical Reynolds number, our approach is proven to significantly reduce the 
number of costly LES executions without losing much accuracy in the posterior 
probability estimation. Here, the model parameters are calibrated against synthetic 
data related to the mean flow velocity and Reynolds stresses at different locations in 
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the flow wake. The influence of the user-elected LES parameters on the quality of 
output data will be discussed. 

Key words: stochastic PDEs, turbulence models, Navier-Stokes equations, adaptive 
hierarchical sparse grid methods, large eddy simulation, Bayesian inference 


1 Introduction 


Eor most turbulent flows encountered in industrial applications, the cost of direct 
numerical simulation (DNS) would exceed the capacity of current computational 
resource (and possibly continue to do so for the foreseeable future). As a result, many 
important decisions affecting our daily lives (such as climate policy, biomedical 
device design, pollution dispersal and energy efficiency improvement) are informed 
from simulations of turbulent flows by various models of turbulence. The accuracy 
of estimated quantities of interest (Qols) by such models, however, frequently suffers 
from the uncertainties on the physical input data, user-chosen model parameters 
and the subgrid model. It is ideal to be able to incorporate these uncertainties in the 
predictions of Qols. 

The basic approach used for approximating turbulent flows has been to compute 
the time- and space-filtered velocity and pressure, which are less computationally 
demanding and of main technical interest, instead of solving for the pointwise 
velocity and pressure prescribed by the standard Navier-Stokes equations. The use of 
turbulence models leads to a level of uncertainty in the performance and inaccuracy 
in the simulation results, due to user-chosen model parameters whose true or optimal 
values are not well-known (parametric uncertainty), or the inherent inability of the 
model to reproduce reality (structural uncertainty). With the fast growth in available 
computational power, the literature on uncertainty quantification for fluid mechanics 
modeling has grown extensively recently. Many stochastic numerical methods have 
been developed, analyzed and tested for simulations of fluid flows with uncertain 
physical and model parameters, see, e.g., [18,20,52,56,60,61]. Sensitivity analysis 
of LES to parametric uncertainty was conducted in [35]. Statistical methods to 
capture structural uncertainties in turbulence models were presented in [17,23,24]. 
Eor inverse uncertainty quantification, we refer to [13,45] (Bayesian inference for 
Reynolds-averaged Navier Stokes (RANS) models) and [16] (adjoint based inverse 
modeling). 

Bayesian inference has become a valuable tool for estimation of parametric and 
structural uncertainties of physical systems constrained by differential equations. 
Sampling techniques, such as Markov chain Monte Carlo (MCMC), have frequently 
been employed in Bayesian inference [19,33,50]. However, MCMC methods [28,58, 
59] are, in general, computationally expensive, because a large number of forward 
model simulations is needed to estimate the PPDF and sample from it. Given the fact 
that one solution of turbulence models easily takes thousands of computing hours, 
MCMC simulations in many CFD applications would require prohibitively large 
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computational budgets. Perhaps due to this demand, efforts on model calibration 
up until now have been limited on the least expensive turbulence model — RANS 
equations [13,45]. To make Bayesian inference tractable for other types of closure 
models, including LES, it is essential to perform the MCMC sampling in a time and 
cost effective manner. 

A strategy to improve the efficiency of MCMC simulations is surrogate modeling, 
which has been developed in a wide variety of contexts and disciplines, see [49] 
and the reference therein. Surrogate modeling practice seeks to approximate the 
response of an original function (model outputs or the PPDF in this work), which 
is typically computationally expensive, by a cheaper-to-run surrogate. The PPDF 
can then be evaluated by sampling the surrogate directly without forward model 
executions. Compared to conventional MCMC algorithms, this approach is advan¬ 
tageous that it significantly reduces the number of forward model executions at a 
desired accuracy and allows sampling the PPDF in parallel. Several methods can 
be employed to construct the surrogate systems, including polynomial chaos ex¬ 
pansion [22], stochastic Galerkin [3], stochastic collocation [2], and polynomial 
dimensional decomposition [48], to list a few. For problems where the quantities 
of interest have irregular dependence with respect to the random parameters, such 
as those studied herein, it should be noted that approximation approaches that use 
global polynomials are generally less effective than those allowing for multi-level, 
multi-scale decomposition. In this direction, one can develop multi-level hierarchical 
subspaces and employ adaptive grid refinement to concentrate grid points on the sub- 
domains with a locally high variation of solutions, resulting in a significant reduction 
in the number of grid points. 

In this paper, we present an adaptive hierarchical sparse grid (AHSG) surrogate 
modeling approach to Bayesian inference of turbulence models, in particularly 
LES. The key idea is to place a grid in the parameter space with sparse parameter 
samples, and the forward model is solved only for these samples. Compared to the 
regular full grid approach, sparse grid preserves the high level of accuracy with less 
computational work, see [4,20,26,27,42,43]. As sparse grid methods require the 
bounded mixed derivative property, which is open for the solutions of Navier-Stokes 
equations and turbulence models in general, a locally adaptive refinement method, 
guided by hierarchical surpluses, is employed to extend sparse grid approach to 
possible non-smooth solutions. This refinement strategy is different from dimension- 
adaptive refinement [21], which puts more points in dimensions of higher relevance 
and more in line with those in [25, 34,46]. Although similar surrogate methods 
has been studied in [36,62] for geophysical and groundwater models, we tackle 
here a more challenging problem of uncertainty quantification of turbulence models. 
Indeed, turbulent flows are notorious for their extremely complex nature and the 
non-smoothness of the surface of LES output data may weaken the accuracy of the 
surrogate. The applicability of surrogate modeling techniques to LES therefore needs 
thorough investigation. In this work, we will demonstrate the accuracy and efficiency 
of the surrogate model through a numerical example of the classical Smagorinsky 
closure model of turbulent flow around a circular cylinder at a sub-critical Reynolds 
number {Re = 500), which is a benchmark test case for LES. The computation will 
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be conducted for the two-dimensional flow, whose outputs have similar patterns as 
three-dimensional simulation, but which is significantly less demanding in computing 
budget. The synthetic data of velocity and Reynolds stresses at different locations in 
the flow wake are utilized for the calibration. 

This work is only one piece in the complete process of calibration and validation of 
LES models to issue predictions of Qols with quantified uncertainties, and many open 
questions remain. We do not attempt to fit the numerical solutions with physical data 
herein, as the two-dimensional model has been known to show remarkable discrep¬ 
ancy with the experiment results. Applying our framework to the three-dimensional 
simulation for parameter calibration against real-world data would be the next logical 
step. Another important problem is to evaluate and compare the performance of our 
AHSG with other surrogate methods (including some listed above) in this process. 
This would be conducted in future research. Also, characterization and quantification 
of the structural inadequacy and comparison of different competing LES models are 
beyond the scope of this study. 

The rest of the paper is organized as follows. The Bayesian framework and the 
adaptive hierarchical sparse grid method of constructing the surrogate system are 
described in §2. In §3, we give a detailed description of the Smagorinsky model of 
sub-critical flow around a cylinder. The performance of surrogate modeling approach 
and results of the Bayesian analysis are presented in §3.4. Einally, discussions and 
conclusions appear in §4. 


2 Adaptive hierarchical sparse grid methods for surrogate 
modeling in Bayesian inference 


2.1 Bayesian inference 


Consider the Bayesian inference problem for a turbulence model 


d = f{d) + e, 


( 2 . 1 ) 


where d = is a vector of Nf reference data, d = (0i ,..., ) is a 

vector of Nq model parameters, f{ 6 ) is the forward model, e.g., Smagorinsky 
model (see §3), with Ne inputs and A/j outputs, and £ is a vector of residuals, 
including measurement, model parametric and structural errors. (Nonlinear model 
d = E{f,e,e) can be considered as well, but leads to more complicated likelihood 
functions, as £ = S~^ {f, 6 ){d)). 

The posterior distribution P{6\d) of the model parameters 6, given the data d, 
can be estimated using the Bayes’ theorem [10] via 


p{d\d) = 


L{d\d)P{d) 

jL{e\d)p{e)de' 


( 2 . 2 ) 
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where P(0) is the prior distribution and L(0|rf) is the likelihood function that mea¬ 
sure “goodness-of-fit” between model simulations and observations. In parametric 
uncertainty quantification, the denominator of the Bayes’ formula in equation (2.2) 
is a normalization constant that does not affect the shape of the PPDF. As such, in 
the hereafter discussion concerning building surrogate systems, the notation P{Q\d) 
or the terminology PPDF will only refer to the product L{0\d)P{0). The prior distri¬ 
bution represents knowledge of the parameter values before the data d is available. 
When prior information is lacking, a common practice is to assume uniform dis¬ 
tributions with parameter ranges large enough to contain all plausible values of 
parameters. 

Selection of appropriate likelihood functions for a specific turbulence simulation 
is an open question. A commonly used formal likelihood function is based on the 
simplistic assumption that the residual term £ in (2.1) follows a multivariate Gaussian 
distribution with mean zero and prescribed standard deviations, which leads to the 
Gaussian likelihood function: 


L{6\d) = exp 




(MVN) 


In this paper, we assume that the residual errors are independent, i.e., the covariance 
matrix E is diagonal. To describe the correlation of the errors or the inadequacy 
of turbulence models, other covariance matrices can also be used (and lead to in¬ 
consistent results) 113,45]. In general, the formal approach has been criticized for 
relying heavily on residual error assumptions that do not hold. Alternatively, informal 
likelihood functions are proposed as a pragmatic approach to implicitly account for 
errors in measurements, model inputs and model structure and to avoid over-fitting 
to reference data 18]. Definition of informal likelihood functions is problem specific 
in nature, and there has been no consensus on which informal likelihood functions 
outperforms others. For the sake of illustration, in §3.4, the exponential informal 
likelihood function is used for the numerical example (together with (MVN)). It 
reads: 


/ 

L(0|J)=exp 

V 



(EXP) 


where d is the mean of observations, / is the mean of the outputs of forward model, 
and ^ is a scaling constant. For some other widely used informal likelihood functions, 
see 154]. 
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2.2 Adaptive hierarchical sparse grid methods for construction of 
the surrogate PPDF 

The central task of Bayesian inference is to estimate the posterior distribution P{d\d). 
It is often difficult to draw samples from the PPDF directly, so the MCMC methods, 
such as the Metropolis-Hastings (M-H) algorithm [19] and its variants, are normally 
used for the sampling process. In practice, the convergence of MCMC methods is 
often slow, leading to a large number of model simulations. To tackle this challenge, 
surrogate modeling approaches seek to build an approximation (called the surrogate 
system) fox P{0\d), then the MCMC algorithm draws samples from it directly with¬ 
out executing the forward model. With this approach, the main computational cost 
for evaluating the PPDF is now transferred to the surrogate construction step. Natu¬ 
rally, an approximation method which requires minimal number of grid points in the 
parameter space, while not surrendering much accuracy is desired. The methodology 
we utilize to construct the surrogate system, presented in this subsection, is similar to 
the method introduced in [62]. Since the method can be applied to functions governed 
by partial differential equations, not limited io P{6\d) or /(0), a generic notation 
r}{d) : Q ^Ris used for the description. The following assumptions are needed: 

(a) The domain ^2 is a rectangle, i.e., ^2 = x ... x , where ^2„ c M, ^ = 

(b) The joint probability density function p(0) is of product-type: 

Ne 

P(^) ~ 

n=l 

where : ^2„ —> M are univariate density functions. 

(c) The univariate domains and density functions are identical: 

Qi = ... = Qnq ; Pi = • • • = pNe ^ 
yielding the same /-level univariate quadrature rules 

= = =: Q,[.]. 

(d) The univariate quadrature rules are nested. 


2.2.1 Adaptive sparse grid interpolation 

The basis of constructing the sparse grid approximation in the multi-dimensional 
setting is the one-dimensional (1-D) hierarchical interpolation. Consider a function 
T] (0) : [0,1] -> M. The 1-D hierarchical Lagrange interpolation formula is defined by 

i(Kme):='£AUtme), 

i=0 


(2.3) 



A sparse grid method for Bayesian inference of LES turbulence models 


7 


where K is the resolution level, and the incremental interpolation operator AUi[r{\ is 
given as 

Mi 

AUi[r]]{e) := £ c,■;■(/.,^(e), / = 0,... ,K. (2.4) 

7=1 

For 7 = 1,..., m/, and ctj in (2.4) are the piecewise hierarchical basis functions 
[12,62] and the interpolation coefficients for AUi[r{\, respectively. For / = 0,... ,^, 
the integer m/ in (2.4) is the number of interpolation points involved in AUi [t]] , which 
is defined by 

mo = 1 , mi = 2 , and m^ = for i > 2 . 

A uniform grid, denoted by AXi = can be utilized for the incremental 

interpolant AUi[ri]. The abscissas of AXi are defined by 

2/-1 

00 , 1 = 0 . 5 , 0pi=O, 0£2 = 1, and 6ij = — - for 7 = 1... ,m/, / > 2. 

1 

k=0 

Then, the hierarchical grid for Uk[ti] (6) is defined by 47^ = 47/. 

Based on the one-dimensional hierarchical interpolation, we can construct an ap¬ 
proximation for a multivariate function r] ( 0 ) : [ 0 , 1 ]^^ —> M, where 0 = ( 0 i,..., Onq), 
by hierarchical interpolation formula as 

lK[rim := Mnm (2.5) 

|i|<^ 

and the multi-dimensional incremental interpolation operator 4i[t7] is defined by 
A[T]]( 0 ) = £ cij^y( 0 ), 


where i := (/i,...,//z^) is a multi-index indicating the resolution level of Ai[ri], 

|i| = /i H- h iNe . ) •= YlnLi and the multi-index set Bi is defined 

by = {j G I 7 „ = 1,.. = 1,... ,7Ve}. As such, the grids for Ai[ri] and 

Xk[b] are defined by AHi := AXi^ x • • • x and Hk '= U|i|<j^Z\Hi. 

In this paper, we employ the piecewise linear hierarchical basis [12,62] and the 
surplus Ci j can be explicitly computed as 

r'o,! =^o[^](^o,i) =Xo[ri]{Oo^i) = t7(0o,i), 

r‘i,j = A['0] ~Xk-i [' 0 ] for |i| = ^ > 0, 

as the supports of basis functions are mutually disjoint on each subspace. As discussed 
in [12], when the function r] (0) is smooth with respect to 0, the magnitude of the 
surplus Ci j will approach to zero as the resolution level K increases. Therefore, 
the surplus can be used as an error indicator for the interpolant TkIb] in order to 
detect the smoothness of the target function and guide the sparse grid refinement. 
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In particular, each point 6^ of the isotropic level-^ sparse grid Hk is assigned two 
children in each ^-th direction, represented by 


^”1 ( 0 ij) — : • • • 5 5 ^in + l,27n-H ^in+lJn+l ’ * * * ’ JNq J ’ 

= ( Jl ’ • • • ’ ’ ^«n + l,27n 5 ^in+lJn+l ’ * * * ’ JNq ) ’ 


( 2 . 6 ) 


for ^ = 1,... ,W 0 . Note that the children of each sparse grid point on level |i| belong 
to the sparse grid point set of level |i| + 1. The basic idea of adaptivity is as follows: 
for each point whose magnitude of the surplus is larger than the prescribed error 
tolerance, we rehne the grid by adding its children on the next level. More rigorously, 
for an error tolerance a, the adaptive sparse grid interpolant is defined on each 
successive interpolation level as 


^KArim := I I (2-7) 


where the multi-index set is defined by modifying the multi-index set B[, i.e., 
= {j G ^il |Cj I > a}. The corresponding adaptive sparse grid is a sub-grid of the 
level-^ isotropic sparse grid Hk, with the grid points becoming concentrated in the 
non-smooth region. In the region where 77 ( 0 ) is very smooth, this approach saves a 
significant number of grid points but still achieves the prescribed accuracy. 


2.2.2 Algorithm for constructing the surrogate PPDF 

In the forthcoming numerical illustration, a surrogate PPDF will be constructed based 
on the sparse grid method, discussed above, with the use of the following procedure. 

Algorithm 2.1 

• STEP 1: Determine the maximum allowable resolution K of the sparse grid by 
analyzing the trade off between the interpolation error and computational cost. 
Determine the error tolerance a. 

• STEP 2: Generate the isotropic sparse grid at some starting coarse level L Until 
the maximum level K is reached or the magnitudes of all surpluses on the last 
level are smaller than a, do the following iteratively: 

- Step 2.1: Simulate the turbulence model f{6) at each grid point Oi j G TLi. 

- Step 2.2: Construct the sparse grid interpolant [/] (®) based on formula 
(2.7). 

- Step 2.3: Generate the adaptive sparse grid for the next level based on the 
obtained surpluses. Set i : = £-\-1 and go back to Step 2.1. 

• STEP 3: Construct an approximate likelihood function, denoted by L{d\d), by 
substituting X£^a[f]far f into the likelihood formula using, e.g., (MVN) or (EXP). 

• STEP 4: Construct the surrogate PPDEP{0\d) via 
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p{e\d)ocL{e\d)p{e). 

After the surrogate is constructed, an MCMC simulation is used to explore P{Q\d). 
Using our approach, drawing the parameter samples does not require any model 
executions but negligible computational time for polynomial evaluation using the 
surrogate system. The improvement of computational efficiency by using surro¬ 
gate PPDF is more impressive when increased samples are drawn in the MCMC 
simulation. 

Finally, it is worth discussing the flexibility of grid adaptive refinement strategies. 
It is known that in calibration problems of turbulence models, different likelihood 
models could lead to conflicting posterior distributions [13,45]. Moreover, for a flow 
problem, experimental data given by different authors is sometimes inconsistent. 
There is also a wide variation of the physical quantities to be measured and recorded. 
Naturally, one would desire a surrogate modeling method that allows for the use of a 
variety of likelihood functions and data sets, at little cost, once the surrogate system 
has been built. An adaptive refinement strategy based on the smoothness of the 
likelihood functions [62] is obviously the least flexible, since the grid is likelihood- 
function-specific. The approach we apply in this work, i.e., an adaptive method that 
is guided by the smoothness of output interpolant, allows the use of an universal 
surrogate of the output, for different choices of likelihood functions and data of the 
same physical quantities. The surrogate for the output is, however, more expensive 
than that built directly for the likelihood function in the former approach, since grid 
points may be generated in the low density region of the likelihood where the forward 
simulations are wasteful. The most versatile method is certainly the non-adaptive, 
full sparse grid method, but the surrogate is also constructed with highest cost in 
this case. To this end, one has to sacrifice the flexibility of the sparse grid surrogate 
to improve the efficiency. The demand of investigating posterior distribution over 
different likelihood functions and data sets and the computational budget need to be 
balanced before an adaptive refinement strategy is determined. 


3 Application to large eddy simulation of sub-critical flow 
around a circular cylinder 

3.1 Parametric uncertainty of Smagorinsky model 


In LES practice, the time dependent, incompressible Navier-Stokes equations are fil¬ 
tered by, e.g., box filter, Gaussian filter, differential filter and the governing equations 
are given by 


Ut ' (u u)-vAu-^V p— V ’ (Ivt'^^u) =/, 

V = 0, 


(3.1) 
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where u is the velocity at the resolved scales, p is the corresponding pressure, > 0 
is the eddy viscosity and is the symmetric part of V operator, see [7]. 

The most common choice for v^, which is studied herein, is known in LES as the 
Smagorinsky model [41,53] in which 

Vr=4|V"«l, (3.2) 

where is = Cs5 and 1-1 = is called the Smagorinsky lengthscale. 

There are two model calibration parameters in this term - the Smagorinsky constant Cg 
and the filter width 5. The pioneering analysis of Lilly [32], under some optimistic 
assumptions, proposed that Cg has a universal value 0.17 and is not a “tuning” 
constant. This universal value has been found later not the best choice for most 
LES computations and various different values ranging from 0.1 to 0.25 are usually 
selected leading to improved results, see, e.g., [1,9,14,15, 37-40]. The optimal 
choice for Cg depends on the flow problems considered and even may be different 
for different regions in a flow field. Indeed, this poses a major drawback of the 
Smagorinsky model. 

The second calibration parameter - the filter width 5 - characterizes the short 
lengthscale fluctuations to be removed from the flow fields. Ideally, the filter width 
should be put at the smallest persistent, energetically significant scale (the flow 
microscale), which demarcates the deterministic large eddies and isotropic small 
eddies, [47]. Unfortunately, such a choice is infeasible, since the flow microscale is 
seldom estimated. Instead, due to the fact that LES requires the spatial resolution 
h to be proportional to 5, the usual practice is to specify the grid to be used in the 
computation, and then take the filter width according to the grid size. The specification 
of grid and filter without knowledge of the microscale could lead to poor simulation. 

An additional calibration parameter involves in near wall treatment. The correct 
behavior of Smagorinsky eddy viscosity Vj near the wall is Vj — 0, since there 
is no turbulent fluctuation there. In contrast, the formulation (3.2) is nonzero and 
introduces large amounts of dissipation in the boundary layer. One approach to 
overcome this deficiency is to damp ig as the boundary is approached by the van 
Driest damping function [57]. The van Driest scaling reads: 

^s=Cs^(\-e-y^''l^^"y’, (3.3) 

where is the distance from wall in wall units, is van Driest constant ascribed 
the value A+ = 25. Various different values of {n^p) have been used - the most 
commonly chosen are (1,1) and (3,0.5), [51]. Eor simplicity, in this work, we fix 
n = \ and treat p only as a calibration parameter. The variation of p alone can 
capture the full spectrum of near wall scaling: /? = 0 means no damping function is 
applied, while a large p associates with fast damping. We call p van Driest damping 
parameter. 
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3.2 Sub-critical flow around a circular cylinder 


The flow concerned in this study corresponds to a time-dependent flow through 
a channel around a cylinder. External flows past objects have been the subject of 
numerous theoretical, experimental and numerical investigations because of their 
many practical applications, see [5,6,44] and the reference therein. In the sub-critical 
Reynolds number range (300 < < 2 x 10^), these flows are characterized by 

turbulent vortex streets and transitioning free shear layers. 

We consider the two-dimensional flow around a cylinder of diameter Z) = 0.1 in 
rectangular domain of size 2.2 x 1.4, consisting a 5D upstream, \1D downstream 
and ID in lateral directions. We employ the finite element method with second order 
Taylor-Hood finite element and polygonal boundary approximation. Our computation 
is carried out on triangular meshes generated based on Delaunay-Voronoi algorithm 
and refined around the cylinder. The ratio of number of mesh points on the top/bottom 
boundaries, left/right boundaries and cylinder boundary is fixed at 3:2:4. As common 
practice, the filter width is chosen locally at each triangle as the size of the current 
triangle. Its value therefore varies throughout the domain, and is roughly 10 times 
smaller near the cylinder than that in the far field. Since the synthetic data will be 
taken in the near wake region, for simplicity, we characterize 5 by the value of the 
filter width on the cylinder surface. 



Fig. 1 A computational grid used in our study on LES of turbulent flow past a cylinder with 
5 = 7r/480. 


The Smagorinsky model with van Driest damping (3.1)-(3.3) is considered with 
V = 2 X 10“"^, / = 0, r = 12 and At = 0.01. The statistics are compiled over the 
last 7 time units, equivalent to a period of ^ 15 vortex shedding cycles. The inflow 
and outflow velocity is (Y^y(1.4 — y),0). The mean velocity at the inlet is Z/q = 1- 
No-slip boundary conditions are prescribed along the top and bottom walls. Based on 
Z/q and the diameter of the cylinder D, the Reynolds number for this flow is Re = 500, 




12 


Hoang Tran, Clayton G. Webster, and Guannan Zhang 


in the sub-critical range. The temporal discretization applied in the computation is 
the Crank-Nicolson scheme. Denoting quantities at time level 4 by a subscript k, the 
time stepping scheme has the form: 

Uk-Uk-l Uk-^Uk-l l/_ N , Y7- 

--V4- - - \--{uk-'^Uk + Uk-l-Vuk-l)+'^p,^ 

-{V-{VT{uk)y^Uk) + y-{VT{uk-l)V^Uk-l))=0, 

V^Uk = 0. 


System (3.4) is reformulated as a nonlinear variational problem in time step 4 . This 
problem is solved iteratively by a fixed point iteration. Let be an initial guess. 

Given the iterate is computed by solving 


77m+l 77 


U 


Uk—l . 

- vA — 


77m+l I 77 


+ Uk-\ 


At 




' 77^+1 


- (V- (vr«)V^M^+i) +V - (vr(Mn)V^M^_i)) = 0, 


(3.5) 


-m+l 


The fixed point iteration in each time step is stopped if the Euclidean norm of 
the residual vector is less than 10“^^. The spatial and temporal discretizations we 
use herein are similar to [30,31], in which they were applied to direct numerical 
simulations of flow around a cylinder at Reynolds number Re = 100. 


3.3 The prior PDF and calibration data 


We will exploit Bayesian calibration for three model parameters Q, p and 5. The 
uniform prior PDF of the uncertain parameters is assumed. The searching domains 
for Cs and p are [0,0.2] and [0,2] respectively, covering their plausible and com¬ 
monly selected values. The range of the prior PDF of 5, on the other hand, would 
significantly affect the computational cost; since the filter width is proportional to the 
spatial resolution. Thus, to reduce the cost of flow simulations, the searching domain 
for 5 is set to be [;r/600, ;r/200], corresponding to relatively coarse resolutions 
where the grid spacing on the cylinder surface ranges from ^ 2 to 6 wall units. As we 
shall see, the response surfaces tend to be more complicated for the low-resolution 
simulation, possibly due to the non-physical oscillations in the underresolved so¬ 
lutions reflecting in the probability space. As a result, coarse grids pose a greater 
challenge for the surrogates to precisely describe the true outputs and are suitable for 
our purpose of verifying the accuracy of the surrogate modeling approach. Figure 2 
shows the distribution of instantaneous vorticity at r = 20 in the near wake region for 
two different choices of turbulence parameters. We can see that the simulated flows 
display laminar vortex shedding, as expected for LES of flows past bluff bodies. The 
difference in phase of vortex shedding in two simulations is recognizable. 
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Fig. 2 Intantaneous vorticity at f = 20 generated by two different choices of model parameters. 
Left: Q = 0.2, p = 0,d = 7r/480. Right: Q = 0.05, p = 0,d = n/120. 


The synthetic data are generated by solving Smagorinsky model (3.1)-(3.3) with 
Cs = 0.15^ p = 0.05 and 5 = ;r/480. The data sets used for calibration process are 
taken at 11 stations in a distance of « IZ) downstream. Specifically, these points 
locate equidistantly on the vertical line v = 0.65 between y = 0.6 and y = 0.8. For 
each point, the data of average streamwise and vertical velocities, denoted by U and 
y, as well as total streamwise, vertical and shear Reynolds stresses, i.e., {u'u'), (v'v') 
and {u'V), are selected, giving a total of 55 reference data. For clarity, the bounds 
of uniform prior PDFs and the true values of calibration parameters are listed in 
Table 1. In Figure 3, the measurements of interested velocities and Reynolds stresses 
along V = 0.65 are plotted for some typical simulations. We observe that except for 
5 = 71/200, the approximated quantities are quite smooth and have expected patterns, 
see [5]. Certainly, the plots show significant differences among different models. In 
practice, LES models which give distinctly poor results such as those at 5 = 71/200 
could be immediately ruled out from the calibration process, informed by the fact that 
the wall-adjacent grid points lie outside the viscous sublayer. However, it is useful 
here to examine the response surfaces and the accuracy of the surrogate systems 
in these cases, and we choose to include these large filter widths in the surrogate 
domain instead. 



True value 

r 

Smagorinsky constant Cs 

0.15 

[0,0.2] 

van Driest parameter p 

0.5 

[0,2] 

Filter width 5 

;r/480 

[;r/600,;r/200] 


Table 1 The true parameter values and the initial searching regions for model calibration. 


Finally, it is worth mentioning that Smagorinsky model coefficients are not the 
only parameters that influence the quality of LES solutions. Indeed, other numerical 
parameters such as time step size and averaging time also have significant impacts, 
see, e.g., [11,51]. While an estimation of their influence is not conducted here, we 
need to ensure that the errors caused by them do not dominate the uncertainties 
in the calibration parameters. A simple validation test is carried out on the flow 
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Fig. 3 Total resolved Reynolds stresses and average velocities along the vertical line at ID down¬ 
stream for some Smagorinsky models. 


statistics generated by Smagorinsky model of Q = 0, /? = 0 and 5 = ;r/480. The 
flow simulation is replicated first with the temporal resolution refined by a factor 
of two, i.e., At = 0.005, and then with a doubled averaging period, i.e., by setting 
r = 19. We also conduct another simulation in which the zero gradient replaces 
Dirichlet outflow boundary condition to justify that the numerical oscillation at the 
downstream boundary does not disturb the inner domain. The maximum change 
in five velocity and Reynold stress profiles of interest in these modified models is 
presented in Table 2. We see that among three investigated source of numerical errors, 
the temporal resolution is the most prominent, as it makes up approximately 80% 
of the change in all data. More importantly. Table 2 reveals that the total maximum 
change is approximately 0.05 in the vertical Reynolds stress data and 0.025 for 
other quantities. Numerical errors of the synthetic calibration data, as well as model 
outputs, are expected to be around these values. In the uncertainty analysis following, 
for the (MVN) likelihood model, we will assume that the reference data are corrupted 
with Gaussian random noise of 0.1. 


Component modified U V {u'u') (v'v') {u'v') 

Time step 0.0148 0.0213 0.0202 0.0371 0.0186 
Averaging period 0.0048 0.0025 0.0014 0.0011 0.0045 
Outflow BC 0.0022 0.0029 0.0016 0.0061 0.0020 


Table 2 The maximum change in average velocity and Reynold stress profiles under the modifica¬ 
tions of time step, averaging period and outflow BC. 
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3.4 Results and discussions 


This section justifies the accuracy and efficiency of the surrogate modeling method 
described in §2, when applied to the numerical example of two-dimensional flow 
around a cylinder specified in §3. We utilize the software package FreeFem++ [29] in 
solving the Smagorinsky discretization scheme. The adaptive sparse grid interpolation 
and integration schemes are generated using functions in the TASMANIAN toolkit 
[55]. The DRAM algorithm [28] is chosen for MCMC sampling of the surrogate 
PPDF. 

The surrogate system for outputs is constructed using the linear basis functions, 
first on the standard sparse grid of level 5, then the grids are refined adaptively up 
to level 8. The total numbers of model executions needed for the four interpolants 
are 177, 439, 1002 and 2190, respectively, which are also the number of points of 
the four corresponding adaptive sparse grids. 

The accuracy of a surrogate modeling approach based on the AHSG method is 
largely determined by the smoothness of the surrogate system, so it is worth examin¬ 
ing the surface of the output data in the parameter space. For brevity, we only plot 
here the vertical Reynolds stress data at the centerline, i.e., (v'v')(0.65,0.7), which 
is among the most fluctuating (See Figure 3). Figure 4 represents some surfaces for 
typical values of filter width generated on level 8 grid. We observe that the surface ac¬ 
cording to 5 = ;r/200 differs from two other cases (S = 7r/600, S = tt/SOO) that are 
remarkably rougher. This, together with Figure 3, confirms the connection between 
the complexity of the output function in both the physical and parameter spaces. In 
Figure 5, the scatter plots for the predicted outputs obtained with the surrogate system 
at level 7 are presented. The approximations show clear improvement in accuracy 
with S e [;r/600, ;r/300], compared to those at larger values. While not considered 
herein, it is reasonable to expect that the surrogate outputs at least maintain the same 
accuracy for S < 7r/600, since more grid refinement will remove extra non-physical 
wiggles. In the next part, we justify that this level of accuracy is sufficient for our 
surrogate-based MCMC method. Although the surrogate systems show remarkable 
discrepancy for large 5, as previously mentioned, these values, leading to visibly 
inadequate outputs, should be excluded in practical calibration processes. While the 
original domain of S is [;r/600, ;r/200], by choosing its true value as ;r/480, the 
effective searching region of S is restricted to [;r/600, ;r/300]. 

To evaluate the accuracy and efficiency of our surrogate modeling approach, the 
DRAM-based MCMC simulations using the surrogate PPDF P(0jd) constructed 
in Algorithm 2.1 are conducted. Each MCMC simulation draws 60,000 parameter 
samples, the first 10,000 of which are discarded and the remaining 50,000 samples are 
used for estimating the PPDF. For the first experiment, (MVN) likelihood function 
is employed; the data are corrupted by 10% Gaussian random noise, treated as 
numerical errors. Figure 6 plots the marginal PPDFs where the three parameters 
are normalized such that the searching region is [—1,1]^. The black vertical lines 
represent the true values listed in Table 1. The red solid lines are the marginal PPDFs 
estimated by MCMC simulations based on the surrogate systems on level 8 grid, and 
the dashed lines represent those based on the surrogate systems on lower levels. The 
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8 = jt/200 8 = jt/300 



Fig. 4 Surfaces of the predicted vertical Reynolds stress data at (0.65,0.7) generated by the AHSG 
method at level 8. Cs and p are normalized such that their searching regions are [—1,1]. 

5 = 71/200 5 £ [71/600,71/300] 




Fig. 5 Scatter plots for the prediction of the output data given by the surrogate system on level 7 
sparse grid. 

figure indicates that the MCMC results according to level 7 and level 8 sparse grids, 
which require 1002 and 2190 model executions correspondingly, are already close 
to each other. Thus, the surrogate PPDF on level 8 is accurate enough for MCMC 
simulations. 

We proceed to compare the accuracy of the surrogate-based with the conventional 
MCMC with equal computational effort, i.e., same number of model executions. 
Due to the high computational cost, a proper conventional MCMC simulation is not 
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Fig. 6 Marginal posterior probability density functions of three Smagorinsky model parameters 
with (MVN) likelihood model estimated using the linear surrogate systems on level 5-8 adaptive 
sparse grids. 

conducted in this work. However, given the accuracy of the surrogate system, we 
expect that marginal PPDFs obtained from conventional MCMC are very close with 
those from surrogate-based MCMC on high-level grid and therefore, run the MCMC 
simulation with samples drawn from level 8 surrogate. The first 10000 samples are 
discarded to minimize the effect of initial values on the posterior inference. Figure 
7 depicts the marginal PPDFs for model parameters obtained with 1002, 2190 and 
50000 samples after burn-in period. Let us remark that if conventional MCMC is 
employed, these are the numbers of model executions required to obtain similar 
results. Comparing Figure 6 and 7 indicates that with the same number of model 
executions, the approximations using surrogate system are more accurate than those 
using conventional MCMC, highlighting the efficiency of our surrogate modeling 
method. 




Fig. 7 Marginal posterior probability density functions of model parameters with (MVN) likelihood 
function estimated using the linear surrogate systems on level 8 adaptive sparse grids with 1002, 
2190 and 50000 samples (excluding 10000 samples for burn-in period). These are the numbers of 
model executions that the conventional MCMC requires to obtain similar results. 


In order to demonstrate that our adaptive refinement strategy based on the smooth¬ 
ness of output data in probability space allows the change of likelihood models with 
minimal computational cost, we perform the above experiment with (EXP) likelihood 
function and ^ = 500 using the same surrogate of outputs. The marginal PPDFs of 
model parameters estimated using the linear surrogate systems are shown in Figure 
8 . Again, they can be compared with marginal PPDFs estimated using conventional 
MCMC with the same number of model executions in Figure 9. The plots confirm 
the accuracy of the surrogate PPDF for MCMC simulations and that surrogate-based 
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MCMC requires less forward model executions than the conventional approach. On 
the other hand, it should be noted that some likelihood models, especially those 
resulting in peaky PPDFs, may require a surrogate system more accurate than that on 
level 8 sparse grid. In those cases, the surrogate needs to be constructed on a grid of 
higher level. 




Fig. 8 Marginal posterior probability density functions of three Smagorinsky model parameters 
with (EXP) likelihood model estimated using the linear surrogate systems on level 5-8 adaptive 
sparse grids. 




Fig. 9 Marginal posterior probability density functions of model parameters with (EXP) likelihood 
function estimated using the linear surrogate systems on level 8 adaptive sparse grids with 1002, 
2190 and 50000 samples (excluding 10000 samples for burn-in period). These are the numbers of 
model executions that the conventional MCMC requires to obtain similar results. 


The calibration results for both likelihood models show that the Smagorinsky 
constant Cs and van Driest damping parameter p have posterior maximizers near 
their true values, while smaller values are somewhat preferred for the filter width 5. 
Meanwhile, the posterior distribution of 5 is peaky, indicating that the data depend 
on 5 and the Smagorinsky models with our selections of filter width (and spatial 
resolution) are incomplete. Indeed, finer grids are needed to sufficiently resolve the 
energy. The plots also reveal that the data are significantly more sensitive with respect 
to 5 than to other parameters. This elucidates why finding the optimal value for 5, 
i.e., determining the ideal place to truncate scale, is a very important issue in LES 
practice. Finally, the positive correlation between Cs and p can be observed in Figure 
10, in which the posterior samples projected on the (Q,/?)-plane are plotted. Given 
that our calibration data are extracted in near wake region, this correlation is expected. 
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As larger value of Cs increases the Smagorinsky lengthscale ^ 5 , larger p would be 
needed for a stronger damping of £s near the boundary. 



Fig. 10 Two-dimensional marginal posterior probability density function of Cs and p with (MVN) 
likelihood model. The MCMC samples are obtained using the linear surrogate system on level 8 
sparse grid. 


4 Conclusion 

In this paper, we present a surrogate modeling approach based on the AHSG method 
for Bayesian inference, with application to quantification of parametric uncertainty 
of LES turbulence models. The method is based on those developed in [62] for less 
complex geophysical and groundwater models, is model independent and can be 
flexibly used together with any MCMC algorithm and likelihood function. The accu¬ 
racy and efficiency of our approach is illustrated by virtue of the numerical example 
consisting of the Smagorinsky model of two-dimensional flow around a cylinder. We 
combine the hierarchical linear basis and the local adaptive sparse grid technique to 
construct surrogate systems with a small number of model executions. Although the 
forward model investigated herein is highly nonlinear and more complicated than 
those in previous studies, our analysis indicates that the surrogate system is accu¬ 
rate for reasonable specifications of search regions. Compared to the conventional 
MCMC simulation, our surrogate-based approach requires significantly less model 
executions for estimating the parameter distribution and quantifying predictive uncer¬ 
tainty. Civen the extremely high cost of turbulence simulations, this computational 
efficiency is critical for the feasibility of Bayesian inference in turbulence modeling. 

While the performance of surrogate modeling method is evaluated in this work 
for a synthetic cylinder flow model on relatively coarse grids, we expect comparable 
results for practical, more complicated calibration and prediction problems using real- 
world data; since three-dimensional, more refined simulations and real experiments 
of these flows are known to produce similar patterns to the investigated physical 
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outputs in this study. Still, a three-dimensional demonstration of our surrogate-based 
approach for these problems is irreplaceable and would be the next logical step. The 
framework presented here could be directly applied to other engineering flow models, 
as well as to the tasks of quantifying the structural uncertainties and comparing 
competing turbulence closure models. The accuracy of surrogate-based MCMC in 
these cases needs to be tested, but the verification, which is much less computational 
demanding than running the conventional MCMC, is possibly worthwhile. Finally, 
besides our AHSG, several other methods can be employed to construct the surrogate 
system. A thorough comparative assessment with those methods is essential to 
fully justify the efficiency of our approach in turbulence uncertainty quantification 
problems and would be considered in the future. 

Concerning sparse grid interpolation methods, additional research in accelerating 
the convergence rate of the surrogate is necessary. One direction is high-order sparse 
grid methods, which utilize high-order (instead of linear) hierarchical polynomial 
basis and whose superior efficiency has been justified for uncertainty quantification of 
groundwater models [62]. On the other hand, given that the outputs and PPDFs do not 
experience same level of sensitivity to different calibration parameters, combining 
locally grid refinement strategy with dimension-adaptive sparse grid methods to 
further reduce the number of interpolation points is worth studying. 
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